# Reinforcement learning in Safetygym

## References:
- https://github.com/openai/safety-gym
- https://github.com/openai/safety-starter-agents
- https://github.com/akjayant/PPO_Lagrangian_PyTorch
- https://github.com/jjyyxx/srlnbc

## Before Running
We use utils of safety-gym and safety-starter-agents in this code

## Pretraining & Internal test data
- Pretraining PPO agents: 
    - python ppo_point_lag1.5_checkpoint.py
- Pretraining Classifier based on the Cross entropy loss: 
    - python ppo_point_train_ce.py --mode=ppo_point_lag1.5
- Collecting Internal test data:
    - python ppo_point_collectintest.py --mode=ppo_point_lag1.5

## Train
- Framework Training:
    - g++ ppo_point_train_grad.cpp -o ppo_point_train_lag1.5.out
    - g++ run_c_codes.cpp -o ppo_point_train_lag1.5.exe -lpthread
    - ./ppo_point_train_lag1.5.exe
    - python ppo_point_train.py --mode=ppo_point_lag1.5 --checkpoint=10000

## Validation
- Baseline Validation: 
    - python ppo_point_val_base.py --exp_name=baseval_pretraining_ppo_point_lag1.5 --checkpoint=10000
- Framework Validation:
    - g++ ppo_point_val_pof.cpp -o ppo_point_val_pof.out
    - g++ ppo_point_val_parallel.cpp -o ppo_point_val_parallel.exe -lpthread
    - ./ppo_point_val_parallel.exe pofval_poftraining_ppo_point_lag1.5 999
    - python ppo_point_val_pof.py --exp_name=pofval_poftraining_ppo_point_lag1.5 --checkpoint=999

